{ggplot2}A graphics framework for elegant plotting in R
class: inverse, center background-image: url(“img/darklight_RichardStrozynski.jpg”) background-size: contain
{ggplot2}R Image by Richard Strozynski
class: middle
{ggplot2}consistent underlying grammar of graphics (Wilkinson, 2005)
very flexible, layered plot specification
theme system for polishing plot appearance
active and helpful community
(e.g. community.rstudio.com, R4DS Learning Community, Twitter)
class: inverse
Figure in Scherer et al. 2019 J. Anim. Ecol.
class: inverse, center, middle
Contribution to #TidyTuesday
class: inverse, center, middle
We use data from the National Morbidity and Mortality Air Pollution Study (NMMAPS),
filtered for the city of Chicago and the timespan January 1997 to December 2000.
chic <- readr::read_csv("https://raw.githubusercontent.com/Z3tt/R-Tutorials/master/ggplot2/chicago-nmmaps.csv")
chic$year <- factor(chic$year, levels = as.character(1997:2000))
tibble::glimpse(chic)## Observations: 1,461
## Variables: 10
## $ city <chr> "chic", "chic", "chic", "chic", "chic", "chic", "chic...
## $ date <date> 1997-01-01, 1997-01-02, 1997-01-03, 1997-01-04, 1997...
## $ death <dbl> 137, 123, 127, 146, 102, 127, 116, 118, 148, 121, 110...
## $ temp <dbl> 36.0, 45.0, 40.0, 51.5, 27.0, 17.0, 16.0, 19.0, 26.0,...
## $ dewpoint <dbl> 37.500, 47.250, 38.000, 45.500, 11.250, 5.750, 7.000,...
## $ pm10 <dbl> 13.052268, 41.948600, 27.041751, 25.072573, 15.343121...
## $ o3 <dbl> 5.659256, 5.525417, 6.288548, 7.537758, 20.760798, 14...
## $ time <dbl> 3654, 3655, 3656, 3657, 3658, 3659, 3660, 3661, 3662,...
## $ season <fct> Winter, Winter, Winter, Winter, Winter, Winter, Winte...
## $ year <fct> 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997,...
{ggplot2}Data → The raw data that you want to visualise
Layers geom_ and stat_ → The geometric shapes and statistical summaries representing the data
Aesthetics aes() → Aesthetic mappings of the geometric and statistical objects
Scales scale_ → Maps between the data and the aesthetic dimensions
Coordinate system coord_ → Maps data into the plane of the data rectangle
Facets facet_ → The arrangement of the data into a grid of plots
Visual themes theme() and theme_ → The overall visual defaults of a plot
ggplot()We need to specify data and the two variables we want to plot as aestethics of the ggplot() call:
.pull-left[
There is only an empty panel because{ggplot2} doesn’t know how it should plot the data. ]
.pull-right[ ]
ggplot()We need to specify data and the two variables we want to plot as aestethics of the ggplot() call:
.pull-left[ Since almost every ggplot() takes the same arguments (data, mapping = aes(x, y)),
we can also write:
… or add the aesthetics outside the ggplot function:
]
.pull-right[ ]
geom_*() and stat_*()By adding one or multiple layers we can tell {ggplot2} how to represent the data.
There are lots of build-in geometrics elements (geoms) and statistical transformations (stats):
Adapted from https://ggplot2.tidyverse.org/reference/
… and several more in extension packages, e.g. ggforce, ggalt, ggridges, ggrepel, ggcorrplot, ggraph and ggdendro.
geom_line()… or a line plot:
.pull-left[
]
.pull-right[ ]
geom_boxplot()… or a box and whiskers plot:
.pull-left[
]
.pull-right[ ]
geom_boxplot()We need to specify the variable as categorial (year), not as continuous (date):
.pull-left[
]
.pull-right[ ]
stat_summary()A handful of layers with attention to the statistical transformation rather than the visual appearance:
.pull-left[
ggplot(chic, aes(year, temp)) +
geom_boxplot() +
* stat_summary(
* fun.y = mean,
* geom = "point"
* )]
.pull-right[ ]
stat_ecdf()You can also easily plot the empirical cumulative distribution function (ECDF) of a variable:
.pull-left[
]
.pull-right[ ]
stat_smooth()You can specify the fitting method and the formula:
.pull-left[
ggplot(chic, aes(date, temp)) +
geom_point() +
* stat_smooth(
* method = "gam",
* formula = y ~ s(x, k = 100),
* se = F
* )Other methods such as method = "lm" for linear regressions and method = "glm" for generalized linear models are available as well. ]
.pull-right[ ]
aes()Aesthetics of the geometric and statistical objects, such as
x, y, xmin, xmax, ymin, ymax, …–
colors via color and fill
transparency via alpha
–
sizes via size and width
shapes via shape and linetype
–
In general, everything which maps to the data needs to be wrapped in aes()
while static arguments are placed outside the aes().
–
e.g.
geom_point(aes(color = season)) to color points based on the variable season
geom_point(color = "grey") to color all points in the same color
aes(color/fill/alpha/size/shape)… and change the color and the shape based on season and year:
.pull-left[
ggplot(chic, aes(date, temp)) +
geom_point(
* aes(
* color = season,
* shape = year
* ),
size = 4,
alpha = 0.2
)]
.pull-right[]
aes(group)You can create subsets of the data by specifying a grouping variable via group:
.pull-left[
However, for most applications you can simply specify the grouping using visual aesthetics
(color, fill, alpha, shape, linetype). ]
.pull-right[ ]
class: inverse, center, middle
scale_*()scale_One can use scale_*() to change properties of all the aesthetic dimensions mapped to the data.
Consequently, there are scale_*() functions for all aesthetics such as:
position via scale_x_*() and scale_y_*()
colors via scale_color_*() and scale_fill_*()
transparency via scale_alpha_*()
sizes via scale_size_*()
shapes via scale_shape_*() and scale_linetype_*()
… with extensions (*) such as * continuous(), discrete(), reverse(), log10(), squrt(), date(), time() for axes * continuous(), discrete(), manual(), gradient(), hue(), brewer() for colors and fills * continuous(), discrete(), manual(), ordinal(), identity(), date() for transparencies * continuous(), discrete(), manual(), ordinal(), identity(), area(), date() for sizes * continuous(), discrete(), manual(), ordinal(), identity() for shapes and linetypes
scale_x_*() and scale_y_*()… and their properties such as the range, scaling, labels, and axis breaks:
.pull-left[
ggplot(chic, aes(date, temp)) +
geom_point(aes(color = season)) +
scale_x_date(
name = NULL,
* limits = c(
* as.Date("1997-01-01"),
* as.Date("1999-12-31")
* )
) +
* scale_y_log10(
name = "Temperature (°F)",
* breaks = c(1, 10, 100),
* labels = scales::scientific
)]
.pull-right[ ]
scale_x_*() and scale_y_*()Some people are annoyed by the extra spacing around the data but we can remove that:
.pull-left[
ggplot(chic, aes(date, temp)) +
geom_point(aes(color = season)) +
scale_x_date(
name = NULL,
* expand = c(0, 0)
) +
scale_y_continuous(
name = "Temperature (°F)",
* expand = c(0, 0)
)]
.pull-right[ ]
scale_color_*() and scale_fill_*()… and change the title of the legend:
.pull-left[
ggplot(chic, aes(date, temp)) +
geom_point(aes(color = season)) +
scale_color_manual(
values = c(
"firebrick",
"dodgerblue",
"darkorchid",
"goldenrod"
),
* name = "Pretty\ncolored\nsesons:"
) ]
.pull-right[ ]
scale_shape_*()… or remove the legend for a specific aesthetic:
.pull-left[
ggplot(chic, aes(date, temp)) +
geom_point(aes(color = season,
* shape = year)) +
scale_color_manual(
values = c(
"firebrick",
"dodgerblue",
"darkorchid",
"goldenrod"
),
* guide = FALSE
) +
* scale_shape_discrete(
* solid = FALSE,
* name = "Year"
* )]
.pull-right[ ]
scale_color_*() and scale_fill_*() — Color Palettescoord_*()Coordinate systems combine the two position aesthetics (usually x and y) to produce a 2d position on the plot.
–
The meaning of the position aesthetics depends on the coordinate system used:
coord_cartesian(): the default with two fixed perpendicular oriented axescoord_flip(): a Cartesian coordinate system with flipped axescoord_fixed(): a Cartesian coordinate system with a fixed aspect ratio–
coord_map(): map projectionscoord_polar(): a polar coordinate systemcoord_trans(): arbitrary transformations to x and y positionscoord_cartesian()In case you want to remove those data points, use scale_y_continuous(limits = c(min, max)):
.pull-left[
]
.pull-right[
ggplot(chic, aes(year, temp)) +
geom_boxplot() +
scale_y_continuous( #<<
limits = c(50, 70) #<<
) #<< ]
coord_flip()coord_flip() allows you to flip a Cartesian coordinate system:
.pull-left[
]
.pull-right[ ]
coord_trans()The difference between transforming the scales and transforming the coordinate system is that coordinate transformation occurs after the statistics:
.pull-left[
]
.pull-right[ ]
coord_map()Since maps are displaying spherical data, we must project the data via coord_map():
.pull-left[
]
.pull-right[
]
facet_*()Facetting generates small multiples each showing a different subset of the data:
Adapted from “ggplot2: Elegant Graphics for Data Analysis” by Hadley Wickham
facet_wrap()facet_wrap() splits the data into small multiples based on one grouping variable:
.pull-left[
ggplot(chic, aes(temp, o3)) +
geom_point(aes(color = year)) +
facet_wrap(~ season,
* scales = "free")It is possible to change the axes range to scale free for each subset (for only one axis to scale free use "free_x" or "free_y". ]
.pull-right[ ]
class: inverse, center, middle
theme() and theme_*()theme()To modify the theme of a plot use theme() in combination with element_*():
.pull-left[
ggplot(chic, aes(date, temp)) +
geom_point() +
* theme(
* axis.text = element_text(
* size = 15,
* face = "bold",
* color = "red"
* ),
* panel.grid.major.x = element_line(
* linetype = "dotted",
* color = "black"
* ),
* plot.background = element_rect(
* fill = "dodgerblue",
* color = "goldenrod"
* )
* )]
.pull-right[ ]
theme_*()Use a built-in theme of {ggplot2}:
.pull-left[
]
.pull-right[ ]
theme_*()Using the argument base_size() you can change the size of the text:
.pull-left[
]
.pull-right[ ]
theme_set() and theme_update()theme_set() and theme_update() override settings completely or partly:
.pull-left[
]
.pull-right[ ]
theme_set() and theme_update()theme_set() and theme_update() override settings completely or partly:
.pull-left[
(g <- ggplot(chic, aes(date, temp)) +
geom_point())
old <- theme_set(theme_grey())
g
*theme_update(
* panel.background = element_rect(
* fill = "tan",
* color = "black"
* )
*)
*g ]
.pull-right[ ]
class: inverse, center, middle
class: inverse, center, middle
To quickly add a title, use ggtitle():
.pull-left[
]
.pull-right[ ]
{ggplot2} has a built-in structure for title, subtitle, caption and tags:
.pull-left[
g +
* theme(
* plot.title = element_text(
* size = 20,
* face = "bold",
* hjust = 1,
* margin = margin(15, 0, 15, 0)
* ),
* plot.caption = element_text(
* face = "italic"
* ),
* plot.tag.position = c(0.15, 0.75)
* )]
.pull-right[ ]
annotate("text")The annotate() function comes from ggplot2 and is designed to use a so-called grob as input:
.pull-left[
ggplot(chic, aes(date, temp)) +
geom_point() +
annotate(
"text",
x = as.Date("1998-07-01"),
y = 10,
label = "The Summer\nof 1998",
size = 3,
fontface = "bold",
color = "firebrick"
) +
* facet_wrap(~year, scales = "free_x")]
.pull-right[ ]
element_markdown() via {ggtext}With the new {ggtext} package, it is possible to use Markdown and basic HTML within strings:
.pull-left[
*library(ggtext)
ggplot(chic, aes(date, temp)) +
geom_point(aes(color = season)) +
scale_color_manual(
name = NULL,
values = c("#0072B2", "#009E73", "#FF8423", "#B22222"),
* labels = c(
* "<b style='color:#0072B2'>Spring:<br></b><i>Apr to Jun<br></i>",
* "<b style='color:#009E73'>Summer:</b><br><i>Jul to Sep<br></i>",
* "<b style='color:#FF8423'>Autumn:</b><br><i>Oct to Dec<br></i>",
* "<b style='color:#B22222'>Winter:</b><br><i>Jan to Mar<br></i>"
* )
) +
* theme(legend.text = element_markdown())]
.pull-right[ ]
annotation_custom(grob)The annotation_custom() function comes with ggplot2 and is designed to use a so-called grob as input:
.pull-left[
*img <- png::readPNG("./img/logo.png")
*logo <- grid::rasterGrob(img, interpolate = T)
ggplot(chic, aes(date, temp)) +
* annotation_custom(
* logo,
* xmin = as.Date("1997-06-01"),
* xmax = as.Date("2000-06-01"),
* ymin = 0,
* ymax = 80
* ) +
geom_point(alpha = 0.2)]
.pull-right[ ]
class: inverse, center, middle
annotate("rect") or geom_rect()?If you plot many elements, use geom_rect():
.pull-left[
*rect <- tibble(
* xmin = as.Date(c("1997-01-01",
* "1999-01-01")),
* xmax = as.Date(c("1997-12-31",
* "1999-12-31")),
* ymin = rep(-Inf, 2),
* ymax = rep(Inf, 2)
*)
ggplot(chic) +
* geom_rect(
* data = rect,
* aes(xmin = xmin, xmax = xmax,
* ymin = ymin, ymax = ymax),
* color = NA,
* alpha = 0.2
* ) +
geom_point(aes(date, temp))]
.pull-right[ ]
geom_vline()We could also indicate new years by a vertical line:
.pull-left[
*lines <- tibble(
* xintercept = seq(
* as.Date("1997-01-01"),
* as.Date("2001-01-01"),
* length.out = 5)
*)
ggplot(chic, aes(date, temp)) +
* geom_vline(
* data = lines,
* aes(xintercept = xintercept),
* color = "grey30") +
geom_point()There are also geom_abline() and geom_hline(). ]
.pull-right[ ]